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APPLICATION  OF  MOLECULAR  MODELING  TO  BIOLOGICAL  PROCESSES 


1 .  INTRODUCTION 


This  study  concerns  molecular  modeling  of  processes 
that  are  known  to  occur  on  cell  surfaces.  Detailed  structural 
information  is  now  becoming  available  at  the  molecular  level. ^ 
This  study  provides  information  about  molecular  structure  and 
interactions  known  to  affect  fundamental  biochemistry  at  cell 
surfaces.^  The  currently  provided  computational  techniques 
increased  the  capacity  of  molecular  modeling  techniques  to 
include  large-scale  assemblies  of  molecules  that  simulate  bio¬ 
logical  materials  and  structures.^  Understanding  this  structural 
information  with  modeling  techniques  is  the  first  step  toward 
understanding  the  effects  of  surface  chemistry  on  membrane 
performance.  Knowledge  of  these  interactions  is  a  step  toward 
the  goal  of  a  useful  model  for  the  cell  surface  membrane. 


Computational  modeling  is  the  primary  tool  used  in 
integrating  detailed  information  on  the  nature  of  molecular 
structure  and  interactions  into  knowledge  of  complex  biological 
processes.^  This  technique  is  based  on  developments  in  theoreti¬ 
cal  chemistry,  whose  applications  have  been  made  possible  by  the 
corresponding  developments  in  computational  science.^  It  has 
been  possible  to  use  the  early  success  in  calculating  atomic  and 
molecular  structure^  as  building  blocks  for  macromolecules  of 
biological  interest”  and  then  to  further  simulate  assemblies  of 
these  macromolecules  as  representations  of  biological  materials.^ 
Molecular  modeling  is  a  systematic  approach  for  understanding 
complex  molecular  organization.  Computational  techniques  orga¬ 
nize  large  amounts  of  chemical  information  into  tools  that  can  be 
used  to  explain  molecular  phenomena  (e.g.,  self-assembly)  that 
are  the  basis  for  the  functioning  of  biological  structures  (e.g., 
cell  walls  and  mombranes) . 

Modeling  these  membranes  is  important  for  understanding 
basic  cellular  processes.  The  membrane  is  a  complex  structure 
composed  of  proteins,  lipids,  complex  carbohydrates,  and  distinct 
chemical  species  (e.g.,  glycolipids  and  glycoproteins),  thet 
combine  the  biochemistry  of  these  basic  building  blocks. 
Understanding  the  biochemistry  of  these  complex  materials  at  the 
molecular  level  will  develop  fundamental  knowledge  for  building  a 
more  complete  model  that  would  encompass  fundamental  questions 
about  the  effect  of  environmental  influences  on  cellular  behav¬ 
ior.  Specific  phenomena  (e.g.,  blood  type  identification,  action 
of  toxins,  and  cell  fertilization  and  division)  are  related  to 
specific  molecular  processes.  Broad  areas  (e.g.,  cell  recogni¬ 
tion,  adhesion,  biochemical  transport,  and  cell  movement)  are 
also  recognized  as  being  related  to  particular  types  of  chemical 
processes.  Successful  detailed  molecular  modeling  of  membranes 
and  cell  surfaces  would  be  a  breakthrough  in  understanding  such 
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general  concerns  as  identification,  survivability,  detection,  and 
other  features  important  to  biodefense  issues. 

The  scope  of  this  study  is  molecular  modeling  of  sialic 
acid  to  study  its  binding  to  wheat  germ  agglutinin  isolectin  l 
(WGAl) .  This  biochemical  process  has  important  implications  for 
studies  on  binding  of  toxins, on  structural  effects  on  cell 
walls, and  other  biological  phenomena.  This  report  will  pre¬ 
sent  results  of  semi-empirical  molecular  orbital  calculations  on 
N-acetylneuraminic  acid  (NeuNAc) .  These  results  yield  informa¬ 
tion  about  molecular  structure,  charge,  and  electronic  distribu¬ 
tion.  This  information  will  be  related  to  a  crystallographic 
study  of  WGAl.^^ 


2.  MOLECULAR  MODELING 

Computational  molecular  modeling  is  based  on  the 
observation  that  complex  phenomena  in  the  physical  world  can  be 
described  by  theoretical  and  mathematical  methods.  "The  more 
progress  physical  sciences  make,  the  more  they  tend  to  enter  the 
domain  of  mathematics,  which  is  a  kind  of  center  to  which  they 
all  converge.  We  may  even  judge  the  degree  of  perfection  to 
which  a  science  has  arrived  by  the  facility  with  which  it  may  be 
submitted  to  calculation."^^  Current  capabilities  allow  quanti¬ 
tative  calculation  of  molecular  properties  with  precision  com¬ 
parable  to  experimental  methods.  Recent  advances  in  computa¬ 
tional  algorithms  and  large  increases  in  available  computing 
power  make  computational  methods  a  unique  tool  for  dealing  with 
the  large  systems  that  form  the  molecular  basis  for  biological 
materials  and  structures.  Understanding  the  chemistry  of  large 
molecules  like  proteins,  and  intricate  mechanisms  like  receptor 
docking,  requires  facilities  such  as  modern  workstations  and 
computational  chemistry  systems  for  computational  speed  and 
graphics  visualization. 

There  are  many  techniques  for  computational  modeling; 
some  important  methods  are  discussed  below.  The  applicability  of 
these  methods  depends  on  the  capacity  of  the  computational 
algorithms  and  the  available  computational  resources.  There  is  a 
direct  relationship  between  accuracy  and  the  number  of  atoms  in 
the  molecule;  larger  numbers  of  atoms  necessitate  more  approxi¬ 
mate  treatment.  Table  1  summarizes  the  limit  on  the  size  of 
molecules  and  relative  requirements  for  computational  resources. 

Ab  initio  quantum  chemistry  is  a  computational  tech¬ 
nique  that  depends  only  on  solving  the  Schrddinger  equation  using  « 

physical  constants  and  approximations  for  the  wavefunction 
involving  linear  combinations  of  atomic  orbitals  (LCAO).^  Cur¬ 
rent  capabilities  allow  routine  calculations  on  molecules  with  up 
to  25  atoms. These  methods  are  most  readily  available  using 
the  commercial  series  of  GAUSSIAN  programs. Ab  initio  methods 
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provide  intricate  details  about  molecular  electronic  structure 
that  are  needed  to  understand  the  complex  reactions  in  molecules 
like  enzymes.  These  methods  allow  the  calculation  of  molecular 
properties,  electron  density  and  electrostatic  potential 

surfaces)  that  give  information  about  the  structures  and  reactive 
properties  of  small  molecules  and  active  sites  in  biological 
macromolecules.  These  methods  are  the  principal  techniques  for 
studying  the  fine  details  of  bond  breaking.  Ab  initio  methods 
give  fine  details  about  intermolecular  interactions,  such  as 
hydrogen  bonding,  which  is  a  central  phenomenon  in  biological 
processes. Ab  initio  methods  have  been  used  to  understand 
spectroscopy  of  biological  molecules  in  water. These  methods 
are  useful  in  understanding  the  complex  structural  transforma¬ 
tions  in  biological  macromolecules  that  are  central  mechanisms 
for  their  activity. 


Table  1.  Comparison  of  Various  Modeling  Techniques 


Ab  Initio  Methods  Most  accurate,  most  detail;  resource 

intensive,  small  molecules. 


Semi-Empirical  Methods  Approximate  electronic  properties;  less 

accurate  molecular  properties  (e.g., 
vibrational  frequencies) ,  rapid  calcula¬ 
tions,  molecules  up  to  100  atoms. 


Molecular  Mechanics  Rapid  approximate  structures;  macro- 

molecules  with  thousands  of  atoms;  no 
electronic  information. 

Molecular  Dynamics  Based  on  molecular  mechanics,  computa¬ 

tionally  intensive;  long  calculations 
for  averages  over  many  structures  and 
trajectories;  approximate  information  on 
molecular  motions;  molecule  size  similar 
to  molecular  mechanics  but  more  severely 
limited  by  available  computational 
resources. 


Theoretical  Linear 
Solvation  Energy 
Relationships 


Rapid  calculations  for  specialized 
molecular  properties;  requires 
experimental  information  for  series  of 
related  molecules;  molecule  size  limited 
by  availability  of  data  and  capability 
of  modeling  properties  desired. 
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Seml-enpirlcal  methods  use  experimentally  derived 
information  to  approximate  the  more  exact  solutions  of  the 
molecular  wavef unction.  These  methods  are  characterized  by  rapid 
calculations  and  economical  use  of  computing  resources.  Thus, 
semi-empirical  methods  can  be  used  on  very  large  molecules. 
Semi-empirical  computations  are  primarily  available  in  MOPAC.^^ 
This  computational  package  has  facilities  for  approximate  calcu¬ 
lations  of  most  molecular  structure  and  property  features  avail¬ 
able  with  ab  initio  methods.  The  semi-empirical  approximations 
allow  economical  calculation  of  the  changing  molecular  structure 
along  a  reaction  path,  thus  allowing  detailed  analysis  of  the 
fundamental  chemistry.  Semi-empirical  methods  have  found  wide 
application  in  molecular  interactions  and  large  scale  systems 
such  as  polymers. 


Molecular  mechanics  is  a  method  that  calculates  molecu¬ 
lar  structure,  energy,  and  vibrational  properties  using  classical 
fcrce  field  theory  and  empirically  derived  force  fields. 

Current  techniques  allow  rapid  and  accurate  structure  calcula¬ 
tions  for  small  molecules.  The  use  of  molecular  mechanics 
techniques  is  essential  for  understanding  the  structure  of  large 
scale  biomolecules  with  thousands  of  atoms.  Recent  breakthroughs 
in  advanced  computational  graphics  techniques  have  been  coupled 
with  molecular  mechanics  to  perform  visual  chemistry  (e.g., 
docking  of  a  drug  with  a  receptor  site) . 


Molecular  dynamics  is  a  computational  tool  used  for 
understanding  the  motions  of  molecules.  It  is  based  on  molecular 
mechanics  and  Newtonian  equations  of  motion.^  This  method  allows 
simulation  of  changes  in  molecular  structure  associated  with 
chemical  reactions  and  biological  activity.  The  introduction  of 
free  energy  perturbation  techniques  uses  molecular  dynamics 
techniques  for  calculating  thermodynamic  quantities  associated 
with  structural  and  chemical  changes.  Molecular  dynamics  is  used 
in  understanding  the  role  of  water  in  biological  processes.  Con¬ 
certed  use  of  molecular  dynamics  with  experimental  data  from 
x-ray  crystallography  is  a  new  technique  for  determining  the 
structure  of  large  macromolecules. 


Theoretical  Linear  Solvation  Energy  Relationships  is  a 
method  for  correlating  and  predicting  physical  and  biological 
activities  with  microscopic  thermodynamic  parameters. This 
is  based  on  the  Quantitative  Structure  Activity  Relationship 
(QSAR)  methods,  which  have  been  widely  used  in  medicinal  chemis¬ 
try.  The  development  of  modern  semi-empirical  methods  allows 
systematic  integration  of  experimental  data  with  computational 
results.  The  efficiency  of  these  techniques  allows  the  applica¬ 
tion  of  these  methods  to  large  molecules  associated  with  enzyme 
activity  and  transport  across  cell  membranes.^'* 
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3.  MOLECULAR  BASIS  FOR  BIOLOGICAL  PROCESSES 

Molecules  are  the  fundamental  building  blocks  for 
biological  materials  and  structures.  The  chemical  basis  for  many 
central  biological  processes  is  now  understood.  Common  molecular 
patterns  and  principles  underlie  the  diverse  expressions  of 
life.'°  Molecular  structure  and  function  is  the  fundamental 
information  for  understanding  biochemical  processes.  The  activ¬ 
ity  of  biological  macromolecules  is  directly  related  to  their 
conformation,  which  is  determined  by  the  molecular  composition. 
The  three-dimensional  structure  of  proteins  is  uniquely  deter¬ 
mined  by  the  sequence  of  the  anlno  acid  residues.  This  controls 
such  processes  as  the  unique  specificity  of  enzymes.  Macromole- 
cule  biosynthesis  is  governed  by  the  structure  and  interaction  of 
the  molecular  building  blocks.  For  example,  this  governs  the 
rates  at  which  biological  materials  are  made.  Information 
transmission  and  storage  in  genetic  materials  is  determined  by 
the  sequence  of  base  pairs.  The  generation  and  storage  of 
metabolic  energy  often  depends  on  small  rearrangements  of  molecu¬ 
lar  conformation.  Computational  modeling  provides  tools  for 
integrating  information  about  molecules  for  the  purpose  of 
understanding  the  complex  chemistry  of  biological  processes. 
Examining  some  basic  types  of  molecules  in  the  cell  makes  it 
possible  to  consider  the  kinds  of  information  computational 
modeling  provides  for  problems  in  biological  defense. 

3.1  Genetic  Materials. 

DNA  and  RNA  are  polymers  of  nucleotides.  They  contain 
sugars  and  phosphate  groups  that  form  the  structural  elements  and 
the  purine  and  pyrimidine  bases  that  convey  the  genetic  informa¬ 
tion.  DNA  is  the  storehouse  for  the  information  that  ultimately 
governs  the  synthesis  of  all  biomolecules.  It  is  generally  found 
in  the  Watson-Crick  double  helix.  Breakthroughs  in  computer 
graphics  and  simulations  have  discovered  that  intercalation  of 
flat  aromatic  rings  is  a  major  mechanism  for  mutations. Using 
molecular  mechanics,  docking  studies  show  that  molecules  (e.g., 
acridine)  can  insert  through  the  edge  of  the  DNA  helix.  This 
effectively  displaces  the  stacking  of  the  bases  and  leads  to 
insertion  or  deletion.  The  effect  of  such  mutations  is  to  alter 
the  reading  frame  in  transcription.  Computational  modeling  is 
used  to  understand  the  binding  of  drugs  to  groove  sites  in  the 
DNA  helices.  Recent  advances  suggest  that  twists  in  the  confor¬ 
mation  of  the  base  pairs  strongly  influence  hydrogen  bonding.'^ 

3.2  Proteins. 

Proteins  are  the  basic  catalysts  for  biochemical 
reactions.  Detailed  information  about  their  structure  is 
available  from  x-ray  crystallography  and  molecular  modeling. 
Computational  techniques  provide  information  about  molecular 
properties  that  is  useful  for  interpreting  the  mechanisms  of  the 
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catalytic  reactions.  Recent  advances  have  used  calculations  of 
electrostatic  fields  to  interpret  the  interactions  and  docking 
mechanisms  for  electron  transfer  in  metalloproteins.  Matching 
the  magnitudes  and  directions  of  calculated  electrostatic  vectors 
for  plastocyanin  and  cytochrome  c  provided  the  relative  orienta¬ 
tion  of  the  two  proteins  and  enabled  detailed  analysis  of  the 
binding  between  them.^® 

3.3  Lipids. 

Lipids  are  relatively  small  biological  macromolecules 
that  are  characterized  by  their  insolubility  in  water  and  gener¬ 
ally  contain  long  fatty  acid  chains  from  12-24  carbon  atoms  in 
length.  They  are  characterized  by  a  high  degree  of  conformation¬ 
al  disorder  and  the  absence  of  subunit  connectivity.  For  these 
reasons,  techniques  such  as  x-ray  crystallography  yield  less 
information  about  the  molecular  structure  in  the  natural  state. 
Computational  modeling  of  lipids  has  primarily  been  focused  on 
packed  layers  because  of  the  flexibility  of  the  individual 
molecule  and  the  interest  in  the  self -assembled  materials  that 
constitute  components  of  the  plasma  membrane.  Recent  advances  in 
large-scale  molecular  mechanics  and  graphics  visualization 
techniques  have  shown  that  a  12-molecule  layer  of  cholic  acid, 
which  is  a  lipid  simv'.lant,  shows  spontaneous  formation  of  irregu¬ 
lar  channels  in  an  approximately  helical  structure.^®  From  the 
dimensions  of  the  channels,  It  is  apparent  that  guest  molecules 
can  be  incorporated  into  a  cholic  acid  micell  through  more  than 
one  channel. 

3.4  carbohydrates . 

Complex  carbohydrates  are  macromolecules  built  from 
simple  sugars.  The  many  hydroxyl  groups  on  the  sugars  give  rise 
to  different  stereochemical  linkages  between  the  monomer  units, 
often  with  branching  to  form  multiple  chains  in  the  sinyle 
polymer.  The  sequence  of  sugars  in  complex  carbohydrates  can 
serve  as  code  words  in  the  molecular  language  of  life.  This 
code  is  the  basis  for  much  of  the  regulatory  and  recognition 
functions  on  the  surfaces  of  cells.  Because  the  three-dimen¬ 
sional  structure  of  complex  carbohydrates  is  often  ill-defined, 
these  molecules  are  a  challenging  class  of  problems  for  computa¬ 
tional  techniques. Recent  work  using  semi-empirical  methods, 
calculated  structure  and  charge  distributions  for  some  glucosi- 
dase  inhibitors.  Differences  in  charge  density  explained  inhibi¬ 
tion  properties  while  structural  changes  were  correlated  with 
lack  of  binding  to  the  enzyme.^® 

3.5  Biological  Materials. 

The  application  of  molecular  modeling  techniques 
to  basic  biological  macromolecules  provides  the  foundation 
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for  constructing  models  of  more  complicated  systems  (e.g.,  mem¬ 
branes)  .  Membranes  consist  mainly  of  lipids  and  proteins.  They 
also  contain  carbohydrates  that  are  linked  to  the  proteins  and 
lipids.  A  goal  is  to  assemble  computational  models  of  various 
components  to  simulate  the  function  of  the  cell  wall  membrane. 
Some  preliminary  results  show  the  feasibility  of  this  idea. 
Molecular  mechanics  and  molecular  dynamics,  coupled  with  new 
mathematical  techniques,  have  been  used  to  model  lipid  bllayers^^ 
and  micelles. Carbohydrate  interactions  with  a  lipid  bllayer 
have  been  modeled.^  Molecular  modeling  results  were  used  in 
understanding  experimental  data  for  a  lipopolysaccharlde  mem¬ 
brane.  The  elestrostacic  properties  of  a  bacterial  phorln  were 
used  to  understand  ion  transport  across  the  outer  membrane  of  a 
gram  negative  bacteria. 


4 .  CRYSTALLOGRAPHIC  STRUCTURE 

For  those  proteins  that  crystallize,  x-ray  diffraction 
techniques  provide  three-dimensional  atomic  coordinates.  This 
information  is  the  basis  for  many  computational  modeling  tech¬ 
niques  because  of  the  ability  to  visualize  chemical  sites  and 
active  processes  at  specific  locations  in  the  protein.  The 
reliability  of  this  information  with  respect  to  the  structure, 
and  thus  function,  of  the  proteins  in  water  (naturally  active 
situation)  is  always  at  question  because  of  possible  changes  in 
conformations,  influences  of  crystal  lattice  contacts,  and 
preferential  orientation  of  hydrogen  bonded  waters  in  the  crys¬ 
tal.  It  has  been  concluded  that  the  static  x-ray  coordinates 
contribute  to  understanding  enzyme  mechanisms  for  the  following 
reasons: 


•  The  structures  of  many  proteins  have  repeatedly  been 
determined  under  many  conditions,  giving  identical  results. 

•  Related  proteins  invariably  have  the  same  polypep¬ 
tide  chain  folding. 

•  Crystal  lattice  contacts  are  too  weak  and  too 
limited  in  area  to  disturb  the  protein  folding  to  any  significant 
extent . 


•  The  properties  of  molecules  in  solution  can  usually 
be  explained  by  the  crystal  structure. 

•  Active  molecules  (e.g.,  enzymes)  often  retain  full 
catalytic  activity  in  the  crystal;  especially  if  no  large  confor¬ 
mational  changes  occur. 

The  usefulness  of  the  crystallographic  data  for  studying  enzyme 
sites  is  important  to  the  whole  range  of  modeling  problems  for 
biological  materials.  It  is  essential  to  recognize  that 
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well-defined  active  sites  and  the  tightly  bound  complexes  found 
In  the  crystal  structures  of  enzymes  are  representative  models  of 
reactions  even  in  the  solution  bound  continual  motion  associated 
with  biological  materials. 


4.1  WGAl. 


This  study  Is  based  on  x-ray  crystallographic  coordi¬ 
nates  for  a  complex  of  WGAl  and  N-acetylneuramlnyl-lactose 
[  (N(3UNAc-a(2-3)-Gal-fl-(l-4)-Glc)  (13)  ] .  The  WGAl  is  a  plant 
lectin;  a  class  of  molecules  that  are  quantitated  by  their 
ability  to  agglutinate  erythrocytes.  Oligiomeric  plant  lectins, 
which  display  stringent  sugar  specificity,  constitute  a  special 
class  of  these  proteins.  As  a  representative  of  the  highly 
conserved  group  of  lectins  In  the  Gramlneae  family,  WGAl  possess¬ 
es  several  properties  distinct  from  other  plant  lectins.  The 
most  Important  Is  the  specificity  for  two  different  types  of 
sacchaz'ides,  N-acetyl-D-glucosamlne  (GlcNAc)  and  NeuNAc 
(a  anomer) .  In  addition,  there  Is  a  requirement  of  two  Isostruc- 
tural  domains  for  sugar  binding  and  the  existence  of  two  Indepen¬ 
dent  noncooperative  binding  sites.  These  features  are  Important 
for  linkages  In  surface  binding.* 

The  WGAl  exhibits  specificity  only  for  acetylated 
sialic  acid  sugars,  which  has  been  explained  based  on  struc¬ 
ture.^^  The  compound  (NeuNAc)  satisfies  the  stereochemical 
requirement  for  an  equatorial  N-acetyl  group  and  an  adjacent 
equatorial  OH  group.  The  N-acetylneuramlnyl-lactose  Is  an 
Important  receptor  analog.  In  the  complex,  only  one  binding  site 
Is  occupied  and  Is  determined  to  be  the  primary  site.  The 
secondary  site  Is  unoccupied  because  of  the  charge  on  the  NeuNAc 
moiety.  In  the  binding  site,  the  N-acetyl  group  makes  the 
largest  number  of  contacts,  contributing  both  hydrogen  bonds  and 
an  Important  nonpolar  Interaction  between  the  methyl  group  and 
the  phenyl  ring  of  Tyr  73.^^  Molecular  modeling  studies  on 
NeuNAc  will  yield  Information  about  the  nature  of  these  Interac¬ 
tions  and  about  molecular  specificity. 

4.2  Sialic  Acids. 

Interest  In  sialic  acids  has  Increased  as  a  result  of 
recognition  of  their  involvement  in  regulating  a  large  number  of 
biological  phenomena.  Sialic  acids  play  a  strong  protective  role 
In  living  cells  and  organisms.  This  appears  to  be  a  result  of 
their  peripheral  position  In  glycoconjugates  and,  correspond¬ 
ingly,  their  frequent  external  location  in  cell  membranes. 


*Wright,  C.S.,  J.  Biol.  Chem.  (in  press). 
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Figure  1  shows  the  basic  structure  diagram  for  neur¬ 
aminic  acid.  This  is  a  nine-carbon  sugar  characterized  by  a 
carboxylic  acid  group  at  position  2  and  a  glycerol  tail  originat¬ 
ing  from  position  6.  The  name  sialic  acid  refers  to  a  class  of 
all  N-  and  0-acyl  derivatives  of  neuraminic  acid.  The  roost 
common  is  NeuNAc  with  an  N-acctyl  substitutent  at  the  5  position. 
This  occurs  on  both  the  a  and  fi  anomeric  form,  with  respect  to 
the  carboxylate  group.  This  molecule  is  associated  with  the 
biological  phenomena  discussed  below.  There  are  many  other 
substituted  sialic  acids;  the  5  glycolyl  substituted  molecule 
(NeuSGc)  is  associated  with  the  Hanganutziu-Deicher  antibodies 
but  not  NeuNAc. There  is  a  large  literature  on  other  forms,  in 
particular  the  recent  synthesis  work  of  Hartmann  and  Zbiral  at 
the  University  of  Vienna. Unsubstit^yted  neuraminic  acid  is  not 
known  to  exist  in  as  a  free  molecule.-^ 


Figure  1.  Substitution  Diagram  for  Natural  Sialic  Acids. 

N-  and  O-Substituents  occur  at  the  corresponding 
positions  R-R4,  where  the  most  common  molecule  is 
NeuNAc,  with  an  acetyl  group  at  position  and 

hydrogens  elsewhere. 


Sialic  acid  residues  exist  as  negatively  charged 
carboxylate  ions  in  the  biological  environment.  Based  on  the 
accumulation  of  the  negatively  charged  sialic  acid  residues  on 
cell  membranes,  it  may  be  expected  that  these  compounds  strongly 
influence  the  behavior  of  the  cells.  For  example,  this  prevents 
aggregation  due  to  electrostatic  shielding  for  blood  platelets 
and  erythrocytes  where,  in  others  (e.g.,  chick  embryo  muscle 
cells) ,  it  appears  to  facilitate  aggregation  possibly  due  to  Ca'^'*' 
bridges. The  repulsive  electrostatic  forces  of  sialic  acids 
contribute  to  the  rigidity  of  the  cell  surface  in  studies  on 
sarcoma  cells.  Sialic  acid  residues  influence  the  viscosity  of 
glycoprotein. 
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Another  important  function  is  known  as  the  anti-recog¬ 
nition  effect.  Ashwell  and  Morell^^  discovered  that  sialic  acids 
masks  the  D-glactosyl  residues  of  various  serum-glycoproteins  and 
thus  protects  their  survival  in  the  blood.  With  enzymatic 
removal  of  the  sialic  acids,  these  molecules  are  rapidly  degraded 
by  the  liver  function  and  removed  from  circulation. 

Another  anti-recognition  use  of  sialic  acid  is  exhib¬ 
ited  by  the  K1  strain  of  Escherichia  coli.  The  cell  makes  long 
polymers  of  sialic  acids  known  as  colomlnlc  acid,  which  cover  the 
surface  to  mask  its  presence  from  the  complement  path.^° 

Sialic  acid  is  also  known  as  the  receptor  molecule  for 
the  influenza  virus  and  for  the  cholera  toxin.  The  cholera  toxin 
consists  of  two  main  parts:  a  catalytic  unit  A  chain  and  a 
membrane  penetration  unit  that  consists  of  five  B  chains. The 
B  chains  bind  to  the  monosialoganglioside  GMl  and  are  responsible 
for  penetrating  the  membrane. 

Sialic  acid  is  associated  with  an  important  trans¬ 
membrane  protein,  glycophorin  A.  This  is  a  single  polypeptide 
chain  with  16  attached  oligosaccharide  units.  It  is  known  that 
all  molecules  integral  to  the  membrane  point  in  the  same  direc¬ 
tion.^'  Sialic  acid  is  part  of  the  attached  oligiosaccharides 
and  is  responsible  for  giving  red  blood  cells  a  very  hydro¬ 
philic,  anionic  coat.^°  Figure  2  is  a  schematic  diagram  illus¬ 
trating  the  complex  structural  interactions  between  lipids, 
proteins,  and  carbohydrates  that  influence  the  molecular  charac¬ 
teristics  of  cell  surfaces.  The  carbohydrates  are  known  to  exist 
on  the  outside  of  the  cell;  this  is  presumed  to  represent  the 
structural  elements  for  glycosolated  transmembrane  proteins  such 
as  glycophorin  A. 

5 .  METHODOLOGY 

N-acetylneuraminic  acid  is  a  large  molecule  with  11 
carbon  atoms,  9  oxygen  atoms,  1  nitrogen  atom,  and  18  hydrogen 
atoms.  Semi-emplrlcal  methods  are  beet  suited  for  molecular 
orbital  studies  on  this  molecule.  The  geometries  of  NeuNAc  and 
NeuNAc'  ion  were  optimized  with  the  PH3  algorithm,  as  contained 
within  NOPAC  5.06.^^  Because  the  molecule  is  large  and  floppy, 
the  PRECISE  option  was  used  to  Insure  convergence.  Initial 
geometries  were  generated  using  the  in-house  Chemview  visualiza¬ 
tion  package  on  an  Ardent  Titan.  Final  geometrical  information 
was  obtained  using  routines  incorporated  into  the  Molecular 
Modeling.  Analysis  and  Display  System  (MMADS)  molecular  modeling 
package. Visualization  of  the  molecular  orbitals  and  electro¬ 
static  potentials  was  obtained  using  Chemview  and  the  SPARTAN 
system,  WAVEFUNCTION,  Incorporated  (Irvine,  CA)  . 
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6.  RESULTS  AND  DISCUSSION 

6.1  Geomfttrv. 

Preliminary  results  are  shown  in  Table  2,  which  com¬ 
pares  bond  lengths  for  the  PM3  optimized  geometry  with  those  from 
a  crystallographic  study. The  optimized  structure  and  atom 
numbering  for  NeuNAc  are  displayed  in  Figure  3. 


Figure  2.  Schematic  Representation  for  a  Glycosolated 

Transmembrane  Protein.  In  Glycophorin  A,  the 
Carbohydrate  Residues  Contain  Terminal  Sialic 
Acid  Groups  as  Represented  in  the  Dlagram^^ 
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Table  2.  NeuNAc  Comparison  of  Bond  Lengths  Between  PM3  and 
Experimental  Results 


fflupMRiiiiiim 

PM3  Geometry 

A 

Experimental 

01  -  Cl 

1.4066 

1.420 

Cl  -  C2 

1.5410 

1.535 

C2  -  C3 

1.5368 

1.519 

C3  -  C4 

1.5545 

1.517 

C4  -  C5 

1.508 

1.532 

C5  -  01 

1.4290 

1.440 

C5  -  C6 

1.5646 

1.517 

C6  -  C7 

1.5582 

1.532 

C7  -  C8 

1.5506 

1.512 

Cl  -  C9 

1.5551 

1.531 

CIO  -  Cll 

1.4996 

1.494 

Cl  -  02 

1.3981 

1.400 

C9  -  03 

1.3526 

1.290 

C9  -  04 

1.2128 

1.198 

C3  -  05 

1.4170 

1.434 

CIO  -  06 

1.2228 

1.237 

C6  -  07 

1.4080 

1.428 

C7  -  C8 

1.4165 

1.432 

C8  -  09 

1.4006 

1.428 

C4  -  N1 

1.4895 

1.452 

N1  -  CIO 

1.4327 

1.330 

Figure  3.  Optimized  Structure  for  NeuNAc  Using  Semi-Empirical 
Methods  and  the  PM3  Hamiltonian 
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Because  the  parameterization  within  the  PM3  Hamiltonian 
Is  derived  from  averages  over  all  types  of  experimental  data  and 
a  large  variety  of  molecules,  the  rough  agreement  between  the  PM3 
optimized  geometry  and  the  x-ray  crystallographic  structure  is 
what  is  expected. As  with  any  computational  technique,  the 
differences  tend  to  be  systematic  in  respect  to  atom  type.  Thus, 
the  calculated  C-C  distances  are  too  long,  the  C-0  distances  are 
too  short,  and  the  C-0  and  C-N  distances  appear  more  dependent  on 
the  local  chemical  environment.  This  is  a  result  of  the  effects 


of  nonspherical  atoms  and  polarization  of  the  electron  dULstribu- 
tion:  problems  faced  by  all  computational  techniques. 

Within  this  error  limit,  the  optimized  structure  of  the  NeuNAc~ 
ion  is  essentially  identical  to  that  of  the  protonated  form,  with 
the  exception  of  the  geometry  for  the  carboxylate  group.  The 
C1-C9  distance  inoreases  to  1.612  A,  and  the  C-0  distances  become 
1.239  and  1.259  A,  respectively.  For  these  studies  on  binding, 
the  conformation  of  the  molecule  becomes  the  important  result. 

The  PM3  calculations  produce  good  agreement  between  the  optimized 
structure  and  the  experimentally  observed  molecular  framework. 

The  calculated  chair  form  is  the  same  as  that  observed  in  the 
crystal  structure,  although  the  positions  of  the  ring  atoms  do 
not  deviate  from  the  ring  plane  to  the  extent  observed  experimen¬ 
tally.  The  ring  torsion  angles  involving  only  carbon  atoms  are 
caloulated  to  be  very  similar  (within  2*),  while  the  ring  tor¬ 
sions  involving  oxygen  differ  to  a  greater  extent  (up  to  11**), 
such  that  the  deviations  from  experiment  are  greatest  for  atoms 
with  polarized  electron  distributions.  The  optimized  structure 
retains  the  axial  and  equatorial  substituents  observed  experimen¬ 
tally;  the  equatorial  positions  of  the  N-acetyl  group  at  C4  and 
the  adjacent  equatorial  hydroxyl  at  C3  are  essential  for  bind¬ 
ing.  The  conformations  of  the  side  chains  are  not  consistent 
with  those  observed  In  the  crystal  structure.  The  positions  of 
03  and  04  In  respect  to  torsion  around  C1-C9  bond,  and  the 
positions  of  06  and  Cll,  in  respect  to  torsion  around  the 
Nl-ClO  bond,  are  opposite  to  those  observed  in  the  crystal  struc¬ 
ture.  The  torsion  angle  for  the  conformation  of  the  glycerol 
tail  (C6-C7-C8}  is  of  opposite  sign  compared  to  that  observed  In 
the  crystal  structure.  Investigating  these  details  of  structure 
with  other  computational  methods  and  other  seml-emplrlcal  approx¬ 
imations  will  be  Important  steps  in  computational  studies  of 
binding  of  sialic  acid. 


^•2  Eiastrgnic  .PrOPfrtltfS _ Highest  Occupied  Molecular 

Orbital  _fHQMQ)  and  Lowest  Unoccupied  Molecular 
Qrbltfll  (LUMP).. 


The  semi-empirical  calculations  allow  rapid  estimation 
of  electronic  properties  known  to  be  important  in  binding. 
Figures  4  through  7  display  the  important  HOMO  and  LUMO  for 
NeuNAc  and  NeuNAc",  respectively.  They  indicate  the  localized 
availability  of  electrons  for  donation  and  sites  for  electron 
acceptance,  which  are  indicative  of  reactivity  of  various 
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Figure  4.  Plot:  of  the  HCWfO  Surface,  at  ±0.1  Density  Value,  for  Protonated  Sialic  Acid 


Figure  5.  Plot  of  the  LUMO  Surface,  at  ±0.1  Density  Value, 
for  Protonated  Sialic  Acid 
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Figure  6.  Plot  of  the  HOMO  Surface,  at  ±0.1  Density  Value,  for  Sialic  Acid  Ion 
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Figure  7.  Plot  of  the  LUMO  Surface,  at  ±0.1  Density  Value,  for  Sialic  Acid  Ion 


portions  of  the  molecule. These  figures  show  that  the 
distribution  of  available  electrons  undergoes  dramatic  change 
with  the  removal  of  the  acidic  proton  from  the  carboxylate  group. 
For  the  protonated  molecule,  the  HOMO  electron  density  clusters 
in  the  region  of  the  N-acetyl  group;  whereas,  the  LUMO  density 
clusters  around  the  carboxylic  acid  group.  With  the  removal  of 
the  proton,  these  density  clusters  shift  position,  with  the  HOMO 
density  clustering  around  the  carboxylate  ion,  and  the  LUMO 
density  clustering  around  the  N-acetyl  group.  This  is  relevant 
as  an  indicator  of  available  electrons  for  hydrogen  bonding. 

These  two  chemical  groups  are  moat  involved  in  hydrogen  binding 
in  the  WGAl  site.^^  The  molecule  exists  as  the  negative  ion  at 
physiological  pH,  so  the  hydrogen  bonding  at  the  carboxylate 
group  is  expected.  The  complete  localization  of  the  LUMO  at  the 
N-acetyl  group  would  seem  contrary  to  its  involvement  with  the 
hydrogen  bond  network.  However,  the  carboxylate  group  binds  very 
tightly  to  the  OH  hydrogen  of  Sex  114  such  that  "this  contact  is 
very  close  in  monomer  If  of  NLIl  where  a  strong  electron  density 
connection  is  present. The  dramatic  shifts  between  the  ion 
and  acidic  forms  indicate  that  the  partial  binding  of  the  close 
contact  would  induce  a  mixed  distribution  of  HOMO  and  LUMO 
density  in  both  the  carboxylate  and  N-acetyl  group. 

6.3  Electrostatic  Potentials. 

The  electrostatic  potential  V  (r)  that  is  created  by 
electrons  and  nuclei  in  a  molecule  is  a  well  established  tool 
for  interpreting  and  predicting  molecular  reactive  behavior 
towards  electrophiles. It  has  been  extensively  used  for  the 
study  of  biological  recognition  interactions.^^  The  value  for 
the  electrostatic  potential  at  any  point  is  given  rigorously  by 
equation  (1). 


o(r')  d(r) 


(1) 


where 


-  nuclear  charge 
o(r')  "  electron  density 
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The  calculation  of  the  electrostatic  potential  surface  for  NeuNAc 
indicates  charge  distributions  suitable  for  hydrogen  bonding. 
Based  on  the  crystal  structure  of  the  isolated  molecule,  this 
electrostatic  potential  surface  suggests  that  NeuNAc  is  a  highly 
amphipathic  molecule  with  most  of  the  attractive  negative 
potential  distributed  as  a  pocket  on  one  side  of  the  molecule. 

The  preliminary  results  of  the  semi-empirical  calculations  raise 
important  questions  about  the  mechanisms  of  binding  for  NeuNAc. 
From  the  crystal  structure  for  the  WGA  complex/*  clear  indica¬ 
tions  about  conformation  of  the  molecule  are  derived  from  inter- 
molecular  distances  between  NeuNAc  and  the  oxygen  atoms  in  the 
protein  residues  of  the  binding  site.  The  hydrogen  bond  contacts 
indicate  that  the  trans  conformation  for  the  peptide  linkage  for 
the  N-acetyl  group,  as  observed  in  the  crystal  structure,”  is 
important  and  not  well  represented  by  the  semi-empirical  results, 
which  yield  a  els  conformation  for  the  PM3,  AMI,  and  MNDO 
Hamiltonians.  This  surface  also  shows  the  clearly  distinct 
potential  around  the  methyl  group  in  the  N-acetyl  group  whose 
nonpolar  contacts  with  a  tyrosine,  Tyr  73,  are  known  to  be 
important  for  binding.  The  potential  surface  also  shows  that 
the  attractive  potential  pocket  is  located  on  the  side  of  the 
molecule  associated  with  nonpolar  contacts  with  the  side  chains 
of  Tyr  <4  and  Tyr  66.  This  raises  questions  about  the  role  of 
electrostatic  charge  in  binding,  since  this  is  a  negatively 
charged  species  at  natural  pH.  The  electrostatic  potential 
surface  for  the  negative  ion  shows  strong  variations  between 
polar  and  nonpolar  regions  of  the  molecule  despite  its  overall 
charge.  The  glycerol  tail  is  known  to  be  flexible  and  in 
different  orientations  in  the  binding  site.**  Changes  in  the 
electrostatic  potential  that  depend  on  the  conformation  of  this 
group  will  answer  questions  about  its  structural  importance  for 
binding. 


7.  CONCLUSIONS 

Molecular  modeling  techniques  are  shown  to  be  suit¬ 
able  for  a  large  number  of  biological  processes.  Fundamental 
components  of  biochemical  systems  can  be  simulated  with  computa¬ 
tional  methods.  These  simulations  can  be  combined  to  provide 
qualitative  visual  representations  of  biological  materials  and 
structures  and  quantitative  evaluations  of  mechanisms  and  ener¬ 
getics.  New  technologies  will  allow  simultaneous  simulation  of 
blomolecular  systems  and  evaluation  of  their  interactions  using 
computational  techniques.^*  These  methods  will  provide  new 
information  about  the  fundamental  physics  and  chemistry  of  living 
systems  that  is  not  readily  available  with  current  experimental 
techniques. 

Preliminary  computational  studies  on  sialic  acid 
illustrate  the  power  of  molecular  modeling  and  visualization  in 
studying  fundamental  chemistry  of  biological  processes. 
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Seml-enpirical  methods  appear  suitable  for  large  molecules  such 
as  sialic  acid.  Electrostatic  potential  energy  surfaces  give 
good  correlation  with  proposed  binding  processes  with  proteins. 
The  representations  of  electronic  structure  give  new  insight  into 
the  important  role  of  molecular  charge  in  the  interactions  of 
sialic  acid  with  proteins  such  as  wheat  germ  agglutinin.  Molecu¬ 
lar  modeling  is  a  useful  and  important  tool  for  obtaining  infor¬ 
mation  that  is  difficult  to  obtain  experimentally  and  for  deve¬ 
loping  new  lines  of  inquiry  for  further  investigation. 

Further  applications  of  modeling  methods  can  yield 
Important  Information  about  biological  materials  (e.g.,  lipid 
membranes,  complex  carbohydrates,  and  protein  complexes).  New 
structural  information  and  computational  technology  will  dramati¬ 
cally  increase  the  size  and  scope  of  molecular  modeling  applica¬ 
tions  and  provide  a  new  basis  for  interactions  with  experimental 
techniques. 
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